NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Efficient and Robust Edge AI: Software, Hardware, and the Co-design

https://doi.org/10.1145/3724396

Kim, Bokyung; Li, Shiyu; Taylor, Brady; Chen, Yiran (May 2025, ACM Transactions on Embedded Computing Systems)

Artificial intelligence (AI) provides versatile capabilities in applications such as image classification and voice recognition that are most useful in edge or mobile computing settings. Shrinking these sophisticated algorithms into small form factors with minimal computing resources and power budgets requires innovation at several layers of abstraction: software, algorithmic, architectural, circuit, and device-level innovations. However, improvements to system efficiency may impact robustness and vice-versa. Therefore, a co-design framework is often necessary to customize a system for its given application. A system that prioritizes efficiency might use circuit-level innovations that introduce process variations or signal noise into the system, which may use software-level redundancy in order to compensate. In this tutorial, we will first examine various methods of improving efficiency and robustness in edge AI and their tradeoffs at each level of abstraction.Then, we will outline co-design techniques for designing efficient and robust edge AI systems, using federated learning as a specific example to illustrate the effectiveness of co-design.
more » « less
Full Text Available
Prosperity: Accelerating Spiking Neural Networks via Product Sparsity

https://doi.org/10.1109/HPCA61900.2025.00066

Wei, Chiyue; Guo, Cong; Cheng, Feng; Li, Shiyu; Yang, Hao Frank; Li, Hai Helen; Chen, Yiran (March 2025, IEEE)

Full Text Available
CSCO: Connectivity Search of Convolutional Operators

Zhang, Tunhou; Li, Shiyu; Cheng, Hsin-Pai; Yan, Feng; Li, Hai; Chen, Yiran (June 2024, Fifth Workshop on Neural Architecture Search)

Full Text Available
Fast Multichannel Inverse Design through Augmented Partial Factorization

https://doi.org/10.1021/acsphotonics.3c00911

Li, Shiyu; Lin, Ho-Chun; Hsu, Chia Wei (February 2024, ACS Photonics)

Full Text Available
NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

https://doi.org/10.1109/ISCA59077.2024.00035

Wang, Yitu; Li, Shiyu; Zheng, Qilin; Song, Linghao; Li, Zongwang; Chang, Andrew; Li, Hai “Helen”; Chen, Yiran (June 2024, IEEE)

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented generation (RAG) for large language models (LLM) now. Among all the ANNS algorithms, graph-traversal-based ANNS achieves the highest recall rate. However, as the size of dataset increases, the graph may require hundreds of gigabytes of memory, exceeding the main memory capacity of a single workstation node. Although we can do partitioning and use solid-state drive (SSD) as the backing storage, the limited SSD I/O bandwidth severely degrades the performance of the system. To address this challenge, we present NDSEARCh, a hardware-software co-designed near-data processing (NDP) solution for ANNS processing. NDSeARCH consists of a novel in-storage computing architecture, namely, SEARSSD, that supports the ANNS kernels and leverages logic unit (LUN)-level parallelism inside the NAND flash chips. NDSEARCH also includes a processing model that is customized for NDP and cooperates with SearSSD. The processing model enables us to apply a two-level scheduling to improve the data locality and exploit the internal bandwidth in NDSearch, and a speculative searching mechanism to further accelerate the ANNS workload. Our results show that NDSEARCH improves the throughput by up to 31.7×,14.6×,7.4×, and 2.9× over CPU, GPU, a state-of-the-art SmartSSD-only design, and DeepStore, respectively. NDSEARCH also achieves two orders-of-magnitude higher energy efficiency than CPU and GPU.
more » « less
Full Text Available
NDRec: A Near-Data Processing System for Training Large-Scale Recommendation Models

https://doi.org/10.1109/TC.2024.3365939

Li, Shiyu; Wang, Yitu; Hanson, Edward; Chang, Andrew; Seok_Ki, Yang; Li, Hai; Chen, Yiran (May 2024, IEEE Transactions on Computers)

Full Text Available
High-efficiency high-numerical-aperture metalens designed by maximizing the efficiency limit

https://doi.org/10.1364/OPTICA.514907

Li, Shiyu; Lin, Ho-Chun; Hsu, Chia Wei (January 2024, Optica)

Theoretical bounds are commonly used to assess the limitations of photonic design. Here we introduce a more active way to use theoretical bounds, integrating them into part of the design process and identifying optimal system parameters that maximize the efficiency limit itself. As an example, we consider wide-field-of-view high-numerical-aperture metalenses, which can be used for high-resolution imaging in microscopy and endoscopy, but no existing design has achieved a high efficiency. By choosing aperture sizes to maximize an efficiency bound, setting the thickness according to a thickness bound, and then performing inverse design, we come up with high-numerical-aperture (NA=0.9) metalens designs with, to our knowledge, record-high 98% transmission efficiency and 92% Strehl ratio across all incident angles within a 60° field of view, reaching the maximized bound. This maximizing-efficiency-limit approach applies to any multi-channel system and can help a wide range of optical devices reach their highest possible performance.
more » « less
Full Text Available
Transmission Efficiency Limit for Nonlocal Metalenses

https://doi.org/10.1002/lpor.202300201

Li, Shiyu; Hsu, Chia Wei (September 2023, Laser & Photonics Reviews)

The rapidly advancing capabilities in nanophotonic design are enabling complex functionalities limited mainly by physical bounds. The efficiency of transmission is a major consideration, but its ultimate limit remains unknown for most systems. This study introduces a matrix formalism that puts a fundamental bound on the channel‐averaged transmission efficiency of any passive multi‐channel optical system based only on energy conservation and the desired functionality, independent of the interior structure and material composition. Applying this formalism to diffraction‐limited nonlocal metalenses with a wide field of view shows that the transmission efficiency must decrease with the numerical aperture for the commonly adopted designs with equal entrance and output aperture diameters. It also shows that reducing the size of the entrance aperture can raise the efficiency bound. This study reveals a fundamental limit on the transmission efficiency as well as provides guidance for the design of high‐efficiency multi‐channel optical systems.
more » « less
Full Text Available
SN 2023ehl: A Normal Type Ia Supernova with High-velocity Features

https://doi.org/10.3847/1538-4357/adcf1b

Zeng, Xiangyun; Hu, Lei; Zheng, Sheng; Li, Xiaolong; Luo, Xiaoyu; Li, Sai; Howell, D Andrew; Dong, Yize; Valenti, Stefano; Fu, Liping; et al (June 2025, The Astrophysical Journal)

Abstract SN 2023ehl, a normal Type Ia supernova with a typical decline rate, was discovered in the galaxy UGC 11555 and offers valuable insights into the explosion mechanisms of white dwarfs. We present a detailed analysis of SN 2023ehl, including spectroscopic and photometric observations. The supernova exhibits high-velocity features in its ejecta, which are crucial for understanding the physical processes during the explosion. We compared the light curves of SN 2023ehl with other well-observed Type Ia supernovae, finding similarities in their evolution. The line strength ratioR(Siii) was calculated to be 0.17 ± 0.04, indicating a higher photospheric temperature compared to other supernovae. The maximum quasi-bolometric luminosity was determined to be 1.52 × 10⁴³erg s⁻¹, and the synthesized⁵⁶Ni mass was estimated at 0.77 ± 0.05M_⊙. The photospheric velocity atB-band maximum light was measured as 10,150 ± 240 km s⁻¹, classifying SN 2023ehl as a normal velocity Type Ia supernova. Our analysis suggests that SN 2023ehl aligns more with both the gravitationally confined detonation, providing a comprehensive view of the diversity and complexity of Type Ia supernovae.
more » « less
Full Text Available
A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

Guo, Cong; Cheng, Feng; Du, Zhixu; Kiessling, James; Ku, Jonathan; Li, Shiyu; Li, Zhixu; Ma, Mingyuan; Molom-Ochir, Tergel; Morris, Benjamin; et al (February 2025, IEEE circuits and systems magazine)

Full Text Available

« Prev Next »

Search for: All records